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' Abstract. We focus on infinite words with languages closed under reversal. If frequencies of 

' all factors arc well defined, we show that the number of different frequencies of factors of length 

n + l does not exceed 2AC(n) + l, whore AC(n) is the first difference of factor complexity C(n) 
, of the infinite word. 
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1. Introduction 

It is well-known that the Rauzy graph, despite of its simplicity, has turned out to be a powerful 
tool in the study of various combinatorial properties of words. The first one to use the idea to 
label edges of the Rauzy graph with frequencies was Dekking [8] in order to show that for every 

• length, there exists at most three different factor frequencies in the Fibonacci sequence. Moreover, 
I he described for every length n, the set of frequencies of factors of length n and the number of 

factors of length n having the same frequency. Berthe in [3] , observing also the evolution of Rauzy 
graphs for growing factor lengths, generalized Dekking's result for all Sturmian wor dsQ 

With help of the Rauzy graph, Boshernitzan [5] deduced an upper bound on the number of 
different frequencies in a general recurrent infinite word. He showed that the number of frequencies 
of factors of length n+l does not exceed 3AC(n), where AC{n) is the first difference of factor 
I complexity of the infinite word. 

• Since AC{n) is known to be bounded for infinite words with sublincar complexity (see [6]), it 
I implies for fixed points of primitive substitutions and for fixed points of uniform substitutions (all 

ps) ■ images of letters have the same length) that the number of different frequencies of factors of the 

' same length is bounded. 

Boshernitzan's upper bound 3AC(n) can be further diminished, if the labeled Rauzy graphs 
corresponding to an infinite word have a nontrivial group of automorphisms. This property of the 
^ ■ Rauzy graphs is guaranteed for example if the language of an infinite word is closed under reversal 

or closed under permutation of letters. The main aim of this paper is to prove the following 
theorem: 
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Theorem 1.1. Let u he an infinite word whose language is closed under reversal and such that 
the frequency p{w) exists for every factor w of the word u. Then for every n S N, we have 

(1) #{p(u;) I £/:„+! } < 2AC(n) + l, 

where Cn+i denotes the set of factors of u of length n+l. 

We also deduce that the equality holds for all sufhciently large n if and only if u is periodic. 
Nevertheless, a recent result of Ferenczi and Zamboni shows that this bound cannot be improved, 
keeping its general validity, even for aperiodic words whose languages are closed under reversal. 
In |10| . they study the infinite words coding fc- interval exchange transformation with the sym- 
metric permutation. The authors show among others that for such infinite words, the equality in 
Theorem II .11 is reached infinitely many times. (In fact, they proved a stronger statement: the set 
of indices n for which the equality ([T]) holds has density one.) 

Finally, let us mention that the idea to exploit a symmetry of the Rauzy graph was already used 
in [2] in order to estimate the number of palindromes of a given length. Our article is intended as 
a further example why it is useful to study symmetries in Rauzy graphs. 



^Note that this result follows also from the 3 gap theorem, see 1121 . 
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2. Preliminaries 

An alphabet ^ is a finite set of symbols, called letters. A concatenation of letters is a word. 
Length of a word w is the number of letters contained in w and is denoted \w\. The set A* of all 
finite words (including the empty word e) provided with the operation of concatenation is a free 
monoid. We will also deal with right-sided infinite words u = uoUiU2.... A finite word w is called 
a factor of the word u (finite or infinite) if there exist a finite word w^^^ and a word w^"^^ (finite 
or infinite) such that u = w'-^'ww'^^' . The factor w^^'^ is a prefix of u and w^^' is a suffix of u. An 
infinite word u is said to be recurrent if each of its factors occur infinitely many times in u. 

Language C of an infinite word u is the set of all factors of u. We denote by the set of 
factors of length n of the infinite word u. Then, we can define complexity function (or complexity) 
C : N ^ N which associates to every n the number of different factors of length n of the infinite 
word u, i.e. C(n) = 

An important role for determining the factor complexity is played by special factors. We say 
that a letter a is right extension of a factor w € Cii wa is also a factor of u. We denote by Rext{w) 
the set of all right extensions of w in u, i.e. Rext{w) = {a € A \ wa S £}. If ^Rext{w) > 2, then 
the factor w is called right special (RS for short). Analogously, we define left extensions, Lext{w), 
left special factor (LS for short). Moreover, we say that a factor w is bispecial (BS for short) if w 
is LS and RS. 

With this in hand, we can introduce a formula for the first difference of complexity AC(n) = 
C{n + 1) -C{n) (taken from [7]). 

(2) AC{n) = {#Rext{w) - l) = {#Lext{w) - l), n e N. 

A language C is closed under reversal, if for every factor w = wi . . .Wn A* also its mirror 
image w = Wn . . .wi belongs to C A factor w which coincides with its mirror image w is called 
palindrome. 

If we denote by Vain the set of palindromes of length n contained in u, then we can define 
palindromic complexity P : N — > N of the infinite word u by the prescription P{n) = ^Valn. 
Clearly, P{n) < C{n) for any positive integer n. A non-trivial inequality between P{n) and C{n) 
can be found in [1]. Here we shall use the result from \T '. if the language of an infinite recurrent 
word is closed under reversal, then 

(3) P{n) + P{n+l) < AC{n) + 2. 

In this paper, we focus on infinite words with well defined factor frequencies. More precisely, we 
will assume that for any factor w of an infinite word u, the following limit exists 

^{occurrences oi w in v} 
hm . 

\v\^oo,vec \v\ 

This limit will be denoted by p{w) and called frequency of the factor w. Let us add that an 
occurrence of w in w = viV2 . . . Vm is an index i < m such that w is a prefix of the word WiWi+i . . . Vm- 

To dispose of all definitions needed for the deduction of an improved upper bound on the 
number of different frequencies, it remains to define the labeled Rauzy graph. 

Labeled Rauzy graph of order n of an infinite word w is a directed graph r„ whose set of vertices 
is Cn and set of edges is Cn+i- Any edge e = wqWi . . . w„ starts in the vertex w — wqWi . . . Wn-i, 
ends in the vertex v = wi . . . Wn-iWn, and is labeled by its factor frequency p{e). 

3. Reduced Rauzy graphs 

Edge frequencies in a Rauzy graph r„ behave similarly as current in a circuit. We may formulate 
an analogy of Kirchhoff 's law: the sum of frequencies of edges ending in a vertex equals the sum 
of frequencies of edges starting in this vertex. As a direct consequence, if a Rauzy graph contains 
a vertex with only one incoming and one outgoing edge, then the frequency of these edges is 
the same, say p. Therefore, we can replace this triple (edge-vertex-edge) with only one edge 
keeping the frequency p. If we reduce the Rauzy graph step by step applying the above described 
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procedure, we obtain the so-called reduced Rauzy graph r„, which simplifies the investigation of 
edge frequencies. In order to precise this consideration, wc introduce the following notion. 

Definition 3.1. Let r„ be the labeled Rauzy graph of order n of an infinite word u. A directed 
pathw^^^w'^^K-.w^''^^ of non-zero length in r„ such that its initial vertex w^'^-' and its final vertex 
yji'm.) Qj-g Qj. Jig^ djid iiiQ other vertices are neither LS nor RS factors is called simple. We 
define label of the simple path as the label of any edge of this path. 

Definition 3.2. Reduced Rauzy graphVn of u (of order n) is a directed graph whose set of vertices 
is formed by LS and RS factors of Cn and whose set of edges is given in the following way. Vertices 
w and V are connected with an edge e if there exists in T„ a simple path starting in w and ending 
in V. We assign to such an edge e the label of the corresponding simple path. 

For a recurrent word u, at least one edge starts and at least one edge ends in every vertex of 
r„. Therefore, no edge label is lost by the reduction of r„. The number of different edge labels 
in the reduced Rauzy graph r„ is clearly less or equal to the number of edges in r„. Let us thus 
calculate the number of edges in r„ in order to get an upper bound on the number of frequencies 
of factors in Cn+i- 

For every RS factor w G Cn, it holds that fj=Rext{w) edges begin in w, and for every LS factor 
V G Cn which is not RS, only one edge begins in w, thus we get the following relation 

(4) #{e| e edge in f„} = ^ i^Rext{w) + ^ 1. 

w RS in Cn V LS not RS in Cn 

Using Equation we deduce that 

(5) #{e| e edge in f„} = AC{n) + ^ 1+ ^■ 

V RS in Cn V LS not RS in £„ 

Since ^Rext{w) — 1 > 1 for any RS factor w and, similarly, for LS factors, we have 

(6) #{weCn\ujRS} < AC{n) and ^{w £ Cn\ vu LS} < AC{n) 

The following result initially proved by Boshernitzan in [5; follows immediately by combining (O 
and je]). 

Theorem 3.3. Let u be an infinite recurrent word such that for every factor w £ C, the frequency 
p{w) exists. Then for every n G N, it holds 

#{p(e) I e e Cn+i} < 3AC(n). 
4. Proof of the Theorem 11.11 

Let us focus in the sequel on infinite words u whose languages are closed under reversal and 
such that the frequency of every factor exists. 

(1) Such words are necessarily recurrent. 

(2) For any pair of factors w,v £ C, it holds 

^{occurrences of w in v} ^{occurrences of w in v} 

H " M ■ 

Consequently, p{w) = p{w) for all factors w of u. 
With the above two ingredients in hand, we will be able to prove an essential lemma. Proof of 
Theorem 11.11 will be then a direct consequence of this lemma. 

Lemma 4.1. Let u be an infinite word whose language C is closed under reversal and such that 
for each factor w £ C, the frequency p{w) exists. Then for every n G N, we have 

#{p{e)\e G Cn+i] < i [P{n) + P{n + 1) + AC(n) -X-Y^+Z, 

where X is the number of BS factors of length n, 

Y is the number of BS palindromic factors of length n, 
Z is the number of RS factors of length n. 
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Proof. Let r„ be the labeled Rauzy graph of u of order n. Let us define a mapping /i which to 
every vertex w G £„ associates the vertex w, to every edge e £ Cn+i associates the edge e. Then, 
/i^ = Id, and, thanks to the closeness of C under reversal, /i maps r„ onto itself, in fact, /x is 
an automorphism of r„. Clearly, every simple path w^^^w'^^^ . . . w*-™' in r„ is mapped by /x to 
the simple path . . . wC). This implies that /i induces an automorphism on the reduced 
Rauzy graph r„, too. 

We know already that the set of edge labels of r„ is equal to the set of edge labels of r„ . Let 
us denote by A the number of edges e in r„ (the number of simple paths in r„) such that e is 
mapped by /i onto itself and by B the number of edges e in r„ such that e is not mapped by ^ 
onto itself, then clearly, 

#{e| e edge in f„} = A + S. 
If e is mapped by fi onto itself, then the corresponding simple path satisfies 

w^i) . . . w^""^ = . . . ^ 

hence, for m even, its central vertex it;^^) is a palindrome, and for m odd, its central edge going 
from iii(~a— ) to iy(~s~) is a palindrome. On the other hand, every palindrome of length n + 1 is 
the central factor of a simple path mapped by ^ onto itself and every palindrome of length n is 
either the central vertex of a simple path mapped by fi onto itself or is BS. Therefore, 

(7) A = P{n) + P{n + 1) - G Cn\w BS in Vain}- 

We subtract the number of palindromic BS factors of £„, in the statement denoted by Y , since 
they are not inner vertices of any simple path. 

Now, let us turn our attention to edges of r„ which are not mapped by ^ onto themselves. 
For every such edge e, at least one another edge, namely /i(e), has the same label p(e). These 
considerations lead to the following estimate 

(8) #{p(e)| e G £„+i} < A + \B = \A + \{A + B). 
Rewriting Equation ([5]), we obtain 

A + B = AC{n) + 2Z - X. 
This fact together with ([7]) and ^ proves the statement. □ 

If we apply on P{n) + P{n + 1) and Z from Lemma |4T] the estimates (|3]) and ([6]), respectively, 
we obtain immediately Proof of Theorem ll.il In fact, we get even a finer upper bound 

(9) #{p(e)|eG /:„+!} < 2AC(n) + 1 - - iy, 

where X is the number of BS factors of length n and Y is the number of BS palindromic factors 
of length n. 

Let us study for which infinite words, the equality in Theorem II. II is attained. Infinite words 
whose languages are closed under reversal are either purely periodic or aperiodic. 

• In case of purely periodic words, for sufficiently large n, the first difference of complexity 
AC(n) = and all factors of length n have the same frequency. 

• On the other hand, aperiodic words contain infinitely many BS factors. Hence, according 
to ([9|), the inequality in Theorem II. II is strict for infinitely many n. 

This reasoning leads to the following corollary. 

Corollary 4.2. Let u be an infinite word whose language C is closed under reversal and such that 
for each factor w £ C, the frequency p{w) exists. Then, the equality 

#{p(e)|e G £„+!} = 2AC{n) + 1 

holds for all sufficiently large n if and only if u is periodic. 
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5. Comments 

(1) Berthe in [3] has shown that for every Sturmian word, the number of frequencies of factors 
of length n equals 2 if Cn contains a BS factor, and is equal to 3 otherwise. Since any BS 
factor of a Sturmian word is a palindrome, the finer upper bound in ([9]) is reached for all 
n e N. 

(2) Ferenczi and Zamboni |10| have proved that infinite words coding fc-interval exchange 
transformation whose language is closed under reversal attain the upper bound in ^ for 
all n e N. As Sturmian words are infinite words coding 2-interval exchange transformation. 
Item (1) is a particular case of their result. 

(3) Another example of infinite words for which the upper bound in Theorem 11.11 is reached 
infinitely many times are fixed points of the following substitution </? on {0, 1}: 

^(0) = 0''l, ip{l)=Q^l, a>b>l. 

The substitution 1^9 is a canonical substitution associated with quadratic non-simple Parry 
numbers (for the precise definition see [9]). 

(4) There exist infinite words having languages closed under reversal, however, containing 
only a finite number of palindromes. For an example see [4j. For such words. Lemma |4. II 
provides even a better estimate 

#Me)|eG/:„+i} < fAC(n). 

(5) The essential idea of our approach relies in the fact that the closeness of the language 
under reversal implies existence of a non-triavial automorphism of the labeled Rauzy 
graph. More generally, our method can be applied on any infinite word whose language C 
possesses a symmetry T : C C with the following properties: 

(a) T is a bijective map, 

(b) for every w,v ^ C, 

#{occurrences of w in v} = ^{occurrences of T{w) in T{v)}. 

Clearly, the mirror image map w ^ w satisfies both assumptions. A further example 
can be obtained if we choose a permutation tt of letters and define Tt^{wiW2 ■ ■ -Wn) = 
7r(wi)7r(w2) . . . for each factor W1W2 ■ ■ ■ w„. It may be shown that the group of all 

such symmetries T is generated by the mirror image map and the mappings T-^. 

(6) If the language of a binary word is closed under exchange tt of letters (such words are 
called complementation-symmetric), no simple path is mapped by tt on itself and, thus, 
each frequency is assigned to at least two edges in a reduced Rauzy graph r„. As the 
number of edges is at most 3AC(rt), we obtain for frequencies the same upper bound as 
in Item (g]). 

(7) The Thue-Morse sequence has in the sense of Item ([Sj the most symmetrical language 
among binary words. It explains why the upper bound from Theorem 11.11 overestimates 
the actual number of factor frequencies. For concrete values of factor frequencies consult 
Frid [n]. 
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